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This paper wfll review Iseven of the measurement development activxuxeb or cne 
TTTi ^Northwest Evaluation Asst)ciation. The overall system that some school districts 
in the states of Oregon 'and Washington are trying to develop and implement is 
* the one described by Doher,ty and Hathaway (1972) in NCME's publication, ^ Measure - 
ment in Education , The overall system contains three cross-referenced banks (a\ 
collection of course level curriculum goals; a collect^ion of calibrated goal 
referenced measurement items/scales; ^and a collection 6f goal-referenced instruc- 
tional strategies), the second of which is the major concern of the Northwest 
Evaluation Association. ' 

For the past several months, members *from the Northwest Evaluation Association « 
have been empirically testing the R^sch model to explore i^ts usefulness in pro- * 
viding scale statistics for content referenced measures. The Rasch mod^l was 
examined to see j.f It -would make the following possible: ^ ^ 

(1) the identification of "problem" itemsj ' ^' 

(2) the id^tif ication of scaled .scores between tests; 

(3) the (identification of thB relationship between the content referenced • 
^ tests' and Existing norm referenced tests; and * 

(4) the scaling of individual items for inclusion in* an "item pool." 
\ Theoretical Framework ' . ^ - 

'phe Rasch test model offers. a promising approach to the, scaling of tests, but as. 
yet has seen only limited application in actual school settings. The most impor- 
tant aspect of the model is the interval scaling it provides for bpth test scdres 
and J.nd^vidual test items on the underlying latent dimension. Using the interval % 
scale© pr6vided by the model, it Is theoretically possible to equate different 
ju \ tests or alternate forms of the Same test by ^ncludiflg a s^tof common items be-' 
*-(^^ , tween the tests. In addition, the Rasch analysis provides an indication of those 
item^ which perform "questionably" and may be inappropriate or invalid. The 
present p^er waa intended to take advantage of the many desirable features of 
\^ Ch^. liasch model in analyzing cofftent referenced* tedtt? developed to assess* speci?^ 
^ fic. hasic skills leaxrning 'goals. • . 

Studies V' * ^ ' * ^ ' 

\ The first study was^ "spearheaded ty Drs. *Ha|:ha^iy and Forst^r of* Portland Public 

Schools and involved a field test of 200 reading and 200 mathematics litems appro'xi- 

, mately evenly divided between the fourth and eighth grades. The items were arranged 

into .26 "interlocking" tests_so that e^ch test shared approximately ten items with 

' ^>>^. a preceding and a succeeding test.. This arrangement of the items was intended to 
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make it possible to \ise the Rasch methodology to link together the analyses of 
each test. Each test form was given to approximately 250 students to inr nre as 
much stability as possible in the results. There were no other conditions, however, 
with rggard to the students taking. a test, since earlier studies (Wright 1967;» 
Stiles 1974) had demonstrated that the Rasch yield^ed virtually^ identical results 
for bright students and ^low students. 

Each test was analyzed separately using •the R^ch technique. Fi^se, each Item ^ * 
was screened with respect to its discrimination and it:s fit to 'the Rasch model. 
The- discrimination index^used was the famd^liar point biserial correlation^be tween ^ 
the item and the raw score. ' The fit to the model is based on the discrepancy be- 
tween the actual percent of stiidents at each achievement level who get the item 
correct, and the predicted percent specified by Rasch equations. It was found that 
th/two methods generally identified the same itoms as defective, but that* the 
Rasch fit to the model appro^eh did pick up ^om? item defects it^issed by the point 
biserial. In .particular , the fit appeared* to be more sensitive to items which 
g^ve cues to^the less able stucfents that enabled them to guess'the rights answer 
and subtle item defects that were 9nly-^cognizeu by the most able students at\d 
therefore caused them to miss .the item. After eliminating the faulty *items, the 
next step was, to "linlc" together the analyses from the separate tests. This was 
accomplished by using the characteristic of the Rasch that eadh- item in a test is 
assigned its specific difficulty level, regardless of the students taking the test. 
Thus,' while the average percent correct on an item differs, dramatically between 
groups *of slower students. and brighter students, ' the Rasch difficulty value assigned 
that it^em remains constfant (i.e.*, within the, error of the estimate). Since this is 
also true of the same items when they appear on different tests , the organization 
of* the tests (described above) made it possible, to determine the average Rasch dif- 
ficulty value' for the same item^ af>pear ing > in the different tests. This, in turn, 
made it possible to equate the difficulties for the items which the tests did not 
have in common. For exampl^. assume th^t 'test A and test B shared ten items whose 
av^isage Rasch difficulty value was 55 on test A and 45 on test B . In equating^ the 
items on test B to the items on test A it would be necessary to add 10 to the Rasch 
difficulty value for each .item on B. Similarly, if tests B> and C had ten .items in 
common and the average difficulty was 50 on B and 45 on C , then.it would be neces- 
sary to add 15 to the Rasch difficulty of the C items to equate them to A. Follow- 
ing this procedure it was possible to develop single difficulty scales* for fourth 
grade reading, fourth gfade mathematics, eighth grade reading and eighth grade 
mathemaf ics. . " 

In a second study,. an effort has been made to relate the difficulty of these it^ms 
to the items in the current Portland citywide testing program. Since the citywide 
test was locally developed by districts in the Portland metropolitan area and free • 
of publishers' copyrights, the completiqn df this research program would free us 
to, .use our items ^ a flexible' item pool. This, -In turn, would make it possible to 
dev'elop several shorter test^ aimed at specific achievement levels that would yield 
^ higher reliability and a smaller error of measurement than our current survey tests. 

It should 'be mentioned here that an extensive effort has been made to relate each 
item' to a learning goal which curriculum staff have indicated it measures. This 
procedure would make it possible to check the content coverage of a test as well as 
its measurement characteristics and to add calibrated items 'to "fill* in the gaps." 
"In this latter case, the ^lasch^shows promise in making 'it possible to equate the^ 
- Yiew test to the old test and maintain continuity of the normative data reported *to 
the public. Experimentation is still in process with t^is equating aspect of the.^ 
"kasch analysis. 



fn another vein, for a Xhird study, two cooperative projects have been ♦undertaken 
with the Parkrdse (Oregon) and Tacoma School Districts to equate the Portland 
Basic Skills mathematics items to mathematics tests developed by them. By employ- 
ing a similar design to that used in the original field test, each district has 
dev^floped tests containing their items and ten of the Portland items. When the 
r-esults of thfeir test administration are available, it will be possible to equate 
the Rasch difficulties for all the items on a single scale. As well as enlarging 
the item pool^- it will make the items'^of each district available to the otbex 
districts as well as available normative .data on these tests. In this way, it is . 
hoped It will be possible to explore cross-district comparisons based on^a sound ' 
foundation. . ^ , 

\ ^ ^ - • 

A 'fourth major lesp development effort was spearheaded by Prs. Forbes and Ingebo 

of Pprtlarid Public Schools, and has focused j!)n mathematics at the seventh grade ^. 
level.- Approximately 250 items were a'vwilable from a po'ol of items developed for 
the^ seventh gra^ie Portland Metropolitan Area Test. These items had been previously 
field tested, andf percent right information was available for ^^ch item. Using 
this data, the items were arranged into foifr interlocking tests of inc^^easing dif- 
ficulty. The links between tests were also included in three aliditional tests * ' 
^which.made it possible to cross-validate the link between each pair of tests. Each 
of these seven tests was given to approximately 300 students ranging from sixth . 
graders to ninth graders, based on the relati^ difficulty of the test. 

The tests *were analyzed in *much the same manner as that described for th^ Portland 
Basic Skills tests. In addition, the tests were analyzed separately by subteist 
(computation, concepts, and problem solving). The comparison of the overall links 
and thes^ subtest links is the subject of ah important paper being presented by 
Dr. Forbes at this conlference. The results of the overall links between these 
tests were extremely encouraging.* It was shown that the values used to equate 
items between the ^ four I basic tests agreed closely with the cross-validation valu 
This study established Iboth the single scale for itetn difficulties and the robusj^- 
neas of the Rasch linki^ng procedure in a single operation. 



In. a fifth study. Dr. Forbes has initiated an even more ambitious effort designed 
to explore the feasibility of using the Rasch analysis in def^igning shorter te^ts 
focused at a student's performance level to replace our currei;it survey testing 
program. He has designed seven interlinking te.'^ts, each with approximately 2 
items, arranged to represent increasing difficulty levels arid balanced to rej^r^sent 
computation, concepts and problem solving. These tests will be administered/ with 
the regular survey test in the "spring of 1975 and the results analyzed to compare 
the reliability of these , shorter more specific, tests with that of the general sur- 
vey tests. Obviously, this study will have significant implications for a/11 future 
test development. activities which the Northwest Evaluation Association wi|fl under* 
take. ' , ' 



In a sixth study, the Metropolitan Area Te,st Planning Board (supported several 
school districts in the Portland metropolitan area) under the leadership of Dr. 
Forbes has undertaken a project designed to Rasch calibrate the previously develop-* 
'ed survey tests. The mechanics of this study closely .resetnble the previously de- 
scribed studies (i.e., the development of , overlapping tests administered to, approxi- 
mately 300 students each). This project will provide valuable data concerning the 
capability of the Rasch to equate scores from several tests to t]\e sape interval 
scale. ' 'This is extremely important since there is no .existing adequate methodology 
for combining student scores on different tests. In addition, this ^tudy will make 
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it possible to compare 'the norms de';rloped_rfor these tests across grades and pos-^ 
sibly use them to establish checkpoints on continuous developmental ^scales in 
reading and mathematics. 

Through the cooperative efforts of Leonar;d Winchell and William Conley of Tacoma 
Public ^chools^ Dr. Stiles conducted the seventh study/ In this study an attempt 
was 'made to link a locally developed 48-item content referenced arithmetic compu- 
tation test with a commercially developed norm referenced .tes t. In the fall of 
1974 fou'rtfi, fifth and sixth grcide pupils were administered both the Mathematics 
Management By Objectives Test (MAMBO) and Form Q2 ^ Arithmetic Section of the CTB/ 
McGraw*-Hill Comprehensive Tests of Basic Skills (CTBS), ' ' ^ ' ^ 

Initially, correlations were computed on the item difficulties obtained through 
the Rasch analysis between each of the grade levels. The correlations ran from 
0.900 to 0.976 showing that the Rasch item difficulty index did maintain stability 
of item difficulty from grade level to grade leveli In addition, Rasch Achievement 
5rf:ale Indices were calculated for each subtest of the CTBS and compared to the 
publisher's Expanded Scale Scores, which was produced through the Thurstone abso- 
lute scaling pii^cedurqt The correlations from this comparison ranged from 0.985 
to 0.998, showing the Rasch procedure has comparability' with the Thurstone "absolute 
scaling ^procedure. ^ ' 

In the second phase of the study, Rasch scale parameters were estimated for the 
MAMBO and each of the three arithmetic subtests of the CTBS. Linking^ 'equations^, 
were then developed to equate-each of these measures to one another and to the 
total arithmetic scale of the CTBS. 

Additionally, Dr. Stiles is heading an activity sponsored by the Washington State 
Office of the Superintendent of Public Instruction to identify the basic skills 
measures (both commercial and non-commercial) currently administered, in the*states 
of Washington and Oregon along with dates and places of administration.. Volunteers 
fbr additional content referenced measurement testing willrbe obtained and from 
this a plan will be developed for linking these l?asic skills measures by content 
area during the 1975-76 school yenr. 

Supnnary ^ 

Based^ on the analysis of data cited in this paper, it appears that the I^asch model 
has met the criteria set for it. The items indicated as poorly fitting the model 
have evidence^d defects which warrant revision or elimination. The linking equations 
between tests appear to provide consistent estimates of item performance and test 
scaling based on the cross-validation of 'results. Finally,, these data are supportive 
of the feasibility of obtaininlg estimates of individual item difficulties which are 
consistent across testing situations. ' . . 

The efforts of the Northwest Evaluation Association provide considerable optimism 
about the usefulness of the Rasch test analysis model in solving a variety of impor-f 
tant school testing problems. It appears that the model can provide, the necessary 
link between content referenced and norm referenced tests. Support is also^offered 
for the feasibility' of scaling each item inclependent of student performance, and 
independent of the scaling of all other items. This latter result may provide the 
ba'^i$ for a fully flexible content referenced testing program providing the infor-' 
mation ncTw only available through norm referenced tests. ■ ^ 
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